Capabilities' Substitutability and the "S" Curve of Export Diversity 
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Product diversity, which is highly important in economic systems, has been highlighted by recent 
studies on international trade. We found an empirical pattern, designated as the "S-shaped curve", 
that models the relationship between economic size (logarithmic GDP) and export diversity (the 
number of varieties of export products) on the detailed international trade data. As the economic size 
of a country begins to increase, its export diversity initially increases in an exponential manner, but 
overtime, this diversity growth slows and eventually reaches an upper limit. The interdependence 
between size and diversity takes the shape of an S-shaped curve that can be fitted by a logistic 
equation. To explain this phenomenon, we introduce a new parameter called "substitutability" into 
the list of capabilities or factors of products in the tri-partite network model (i.e., the country- 
capability-product model) of Hidalgo et al. As we observe, when the substitutability is zero, the 
model returns to Hidalgo's original model but failed to reproduce the S-shaped curve. However, 
in a plot of data, the data increasingly resembles an the S-shaped curve as the substitutability 
expands. Therefore, the diversity ceiling effect can be explained by the substitutability of different 
capabilities. 

PACS numbers: 89.75.-k,89.75.Da 



I. INTRODUCTION 

Recent research on international trade has highligh- 
ted the diversity phenomenon, which is generally igno- 
red by conventional economic studies. However, both 
the amount and the types of goods that a country 
produces affect economic growth [l|-l7|. Important new 
facts have been uncovered by analyzing large amounts of 
high-quality data pertaining to international trade. For 
example, there is a negative relationship between the di- 
versification of countries and the ubiquity of products @- 
[l(j |. To account for this phenomenon, Hidalgo et al. 
constructed a tri-partite network model and attempted 
to claim that the capabilities or non-tradable factors a 
country possesses are the "building blocks" of its eco- 
nomy and determine its diversification. To their credit, 
the negative correlation between diversity and ubiquity 
can be reproduced by their model. 

However, Hidalgo et al. did not explain what ingre- 
dients determine the non-tradable capability of a coun- 
try : although they tried to link the economic size or 
richness of a country with the number of these capabili- 
ties it has on paper [ifll, they did not give any empirical 
evidence because these capabilities are non-measureable. 
In contrast, because an economy's size, as measured by 
its GDP, may be the most important datum in modern 
economics, it must have a correlation with a country's di- 
versification degree [ll|. It is obvious that countries with 
large GDP always produce and export more diversified 
products and that countries with small GDP usually have 
more homogenous products and markets [l2| - |l4| . This ob- 
servation can be described quantitatively by an S-shaped 
curve that models a country's logarithmic GDP and ex- 



port diversity |l|, Il5l4l7| . This interdependence between 
size and diversity is ubiquitous in global trade and eco- 
nomic systems ; furthermore, it is common in ecological 
systems |18l - l2l| . The classical "area-species" relationship 
in ecology which is another example of the interdepen- 
dence between size and diversity resembles an S-shaped 
curve flllSil. 
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To this point, the theoretical understanding of the S- 
shaped curve as a model of the relationship between eco- 
nomic size and export diversity is still deficient. There- 
fore, this paper tries to build a model to reproduce this 
size-diversity curve. Initially, we simply link the probabi- 
lity that a country may possess certain capabilities with 
its economic size. In this way, we can investigate how 
economic size determines a country's diversification. Ho- 
wever, this interdependence between size and diversity 
in Hidalgo et al.'s model is exponential, as they have 
noted ; thus, once a country's economic size exceeds a 
certain threshold, the country will receive increasing re- 
turns. As a result, the type of products the country's 
businesses are able to export increases without any li- 
mitation ; otherwise, the country's economy cannot over- 
come the so-called "quiescence trap" 0,[i3|. However, the 
empirical data reveal that there is an upper limit of the 
diversity curveQ that cannot be reproduced by the ori- 
ginal tri-partite model. Therefore, we have introduced an 
important parameter into our model, namely, the substi- 
tutability s between different capabilities ; this parame- 
ter's purpose is to relax the overly strict condition on the 
number of capabilities that a product requires. Interes- 
tingly, in paper [i^, Hidalgo et al. mentioned the idea of 
substitutability between factors but they did not develop 
it. In this paper, we report that our model that includes 
the substitutability s can reproduce the S-shaped curve 
of economic diversity. 

This paper is organized as follows : in Section [Hi we 



briefly introduce the S-shaped relationship between the 
export diversity and economic size of countries into our 
model to simulate the empirical relationship between 
these factors and to achieve both accurate and approxi- 
mate analytic solutions. In Sectior lllll we show the simu- 
lation and analytic results (both exactly and approxima- 
tely), which can resemble the empirical "S" curve. Fur- 
thermore, we discuss how the key parameters in our mo- 
del affect the shapes of fitting curves. 



countries have very different types of export products. 
However, the accelerating-growth effect stops at the up- 
permost part of the "S" curve due to the ceiling effect 
of diversity, as large countries' diversification levels are 
not as high as an exponential curve would indicate. This 
pattern in the relationship between economic size and ex- 
port diversity is very stable in all of the years of our data 
set (see jjy). 



II. METHOD 
A. The S-shaped relationship 

The S-shaped relationship between logarithmic GDP 
and export diversity can be derived from the empirical 
data we have collected. The world GDP statistics are 
from the World Bank's web-site (www.worldbank.org) 
and the export diversity data are from the NBER-UN 
world trade database (www.nber.org/data). In the for- 
mer data-set, information including the GDPs, popula- 
tions and other economic data from 240 countries was 
recorded during 1971-2006; in the latter data set, the 
detailed bilateral trade flows of approximately 150 coun- 
tries and 800 types of products (according to the SITC4 
classification standard) during 1962-2000 are included. In 
this paper, we only show the "S" curve in 1995. A more 
detailed discussion of empirical "S" curves for other years 
can be gleaned from previous work [ij. 

The empirical data shows a strong dependence bet- 
ween logarithmic GDP and the types of exports in FIG 
[TJ^ (i?2 = 0.87). The empirical data can be fitted by a lo- 
gistic function ; note that such functions are widely used 
in many disciplines [iq. |24|. |25| : 



D, 
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I + g-fe(X,-a;„) 



(1) 



where, Di stands for the number of categories of goods 
(each category is represented by a distinct 4-digits code) 
that country i export and Xi represents the logarithmic 
GDP of country i. Furthermore, A, k and Xm are para- 
meters of the logistic function. The estimated values are 
shown in the legend of FIG [TJ\. 



From FIGlU we observe that all countries can be di- 
vided into three groups : small countries that are in the 
lower part of the "S" curve (e.g., Liberia), intermediate 
countries (e.g., Iran and New Zealand) that are in the 
"accelerate" part of the curve and large countries (e.g., 
the USA) that are at the top of the curve. The small 
countries in the first group can export very few varieties 
of goods if their sizes do not exceed a certain threshold 
(the "quiescence trap" |l3 ) . The second segment of the 
"S" curve represents an exponential increase, in which. 



B. Model 

It is important to consider why this interdependence 
between size and diversity in international trade exits. In 
paper Q, Hidalgo et al. proposed a tri-partite network 
model to account for various facts regarding export di- 
versity and product ubiquity. In their model, the first and 
third layered nodes are the countries and their products, 
respectively, whereas the nodes in the hidden layer bet- 
ween countries and products are introduced to represent 
non-tradable elements, factors or capabilities, e.g., mana- 
gement skills, raw materials, regulation, property rights, 
etc. Thus, countries need to have these elements locally 
available to produce goods. 

Following Hidalgo's model, we hope to account for the 
S-shaped relationship by constructing a modified model 
which also assumes that each product requires some non- 
trade capabilities, and each country can export a product 
if and only if this country possesses the required capabi- 
lities. 

However, we initially link the logarithmic GDP with 
the degrees of the nodes that represents countries be- 
cause our purpose is to explain the relationship between 
economic size and economic diversity. That is, the num- 
ber of links to country c is proportional to its logarithmic 
GDP Xc, but these links' end- nodes are randomly selec- 
ted among the capability nodes that represents a country 
i that possesses the given capability (see FIG15]A_). Fur- 
thermore, the links between capabilities and products are 
randomly assigned except that they are constrained by 
the given connection density q. The tri-partite network 
is constructed in this way. 

In Hidalgo et al.'s model, country c can export product 
p if and only if the paths from c to p include all of the 
hidden nodes that connect c to p. Thus, all of the ca- 
pabilities that are devoted to producing p are possessed 
by country i. However, this mechanism cannot reproduce 
the "S" curve that models the relationship between size 
and diversity, and as a result, we must replace this me- 
chanism with a new rule we have designed. 

We introduce one important parameter called the "ave- 
rage substitutability rate" (or "substitutability" ) s to re- 
present the proportions of the total capabilities that are 
required to produce product p ; these capabilities can be 
replaced by other available capabilities. Country c can 
export product p if and only if the paths from c to p 
cover (1 — s) * 100% of the capabilities required by p 
(which would imply that the hidden nodes are connected 
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Figure 1. The S-shaped relationship between log(GDP) (Xi) and the export diversity Di of countries. (A) The original data 
and the logistic fitting for all countries in 1995. (B) The blue scatter points are the simulation results from our tri-partite 
model. Whereas the red curve is the analytic result of Eouation lllf see section Hi Bl of main text), the black curve is the logistic 
fitting. 



to p). Hence, among the requisite capabilities, only 1 — s 
fractions are necessary and other s fractions are substi- 
tutable in average. When the substitutability s is 0, all of 
the capabiUties are necessary, and as a result, we recover 
Hidalgo et al.'s model. However, when s increases, more 
countries can export diverse products to the same extent 
as the largest countries. As a result, an S-shaped curve 
between logarithmic GDP and diversity is obtained. 

For example (see FIG. [IK), suppose that country C2 
can only export product P2 when s = and that C2 can- 
not export PI because all of the capabilities connected to 
PI (namely, Al, A2, and A3) would have to be covered 
by the paths from C2 to P2, yet Al is not covered. When 
s increases to 0.5, only 50% of the capabilities must be 
covered by the paths. Therefore, C2 can export PI be- 
cause more than half of the capabilities required by P2 
have been covered. 

In general, we consider N countries whose logarithmic 
GDPs are (Xi,X2, • ■ -jX^). The adjacency matrix bet- 
ween countries and capabilities is Cik, and its elements 
are randomly assigned : 



(-^ik — 



1, with probability r.i (2a) 

0, otherwise, (2b) 



where the subscript k is iterated from 1 to Na (which 
is the total number of capabilities we consider). To link 
the number of capabilities that a country has with this 
country's GDP, we assume that the probability r^ is pro- 
portional to the value of Xi of country Ci symbolically. 



ri ex log{Xi 



(3) 



Actually, any linear relationship between r^ and Xi 
can produce an S-shaped curve. In our model, we let 
ri = {Xi — Xm)l(XM — Xm) to reduce the number of 
parameters as much as possible, where Xm and X^ are 
the largest and smallest value of log(GDP) in the list of 
countries, respectively. Additionally, we assign connec- 
tions from product pj to the required capabilities with 
probability q. The matrix Pkj represents the connections 
between these two layers : 



Pkj = 



1, with probability q (4a) 

0, otherwise. (4b) 



In the above equation, j is iterated from 1 to P (the 
total number of possible products). Suppose the adja- 
cency matrix between countries and products is Mij . We 
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Figure 2. A concept graph of three-layer network and the 
analytic solution. (A) This three-layer network is designed 
to depict the mapping between countries and products. The 
nodes in the first layer represent countries and the nodes' 
sizes represent logarithmic their GDPs. The nodes in the se- 
cond and third layers represent the capabilities and products, 
respectively. A country can produce more products when the 
substitutability s increases. (B) Provided that country c has L 
distinct capabilities (such capabilities are colored in red), the 
probability that this country produces product p is determi- 
ned by the mapping between these capabilities and product p. 
(C) The m capabilities required by product p can be divided 
into two groups based on whether they match the capabilities 
that are owned by country c. 



countries is iV, and P is defined as the total number of 
products in our data-set. The number of capabihties Na, 
the link density between the capabilities and products 
g, and the substitutability rate s are the parameters. In 
each simulation, we can generate a tri-partite network 
according to the rules we introduced, and as a result, the 
relationship between Xi and Di can be derived. 



C. Analytic solution 

Before giving the simulation results, we will first derive 
the analytic relation between Di and Xi to explain the 
mathematical essence of this model can be grasped. 

In our model, all of the connections among the coun- 
tries, capabilities and products are independent per se. 
Therefore, by analyzing the probability that a typical 
country Ci exports a specific product pj, we can derive 
its export diversity : 



D, 



Ptt,.. 



(7) 



where, tt^ is the probability that country c; can produce 
any specific product. 

Suppose the capabilities that country Ci possesses and 
that product pj requires are k^ and kp^ , respectively. 
Because the density of capability that a country has is 
proportional to Xi, the expectation value of kc- is also 
proportional to Xi. To simplify our discussion, we treat 
kc- as a predetermined value and use kc- to represent its 
expectation value. 

The probability tt^ can be decomposed into various 
other probabilities as follows : 






assume that country a can export product pj{ i.e., that where Prjc^ -^ Pj\kpj — rn} is the probability that 

country Ci exports product pj , which depends on the de- 
gree {kp^ ) of Pj being m. 

First, because a product has a probability q of requi- 
ring one capability, the number of capabilities kp^ re- 
quired by Pj satisfies a binomial distribution. Hence, we 
know the probability that node pj requires m distinct 
capabilities is : 



Mij = 1) , if and only if the proportion of capabilities that 
are owned by the producing countries is at least a certain 
percent (s) of the total number of capabilities that are 
required by products : 



M„ 



otherwise. 



(5a) 
(5b) 



Finally, the export diversity Di is defined as the to- 
tal number of types of products that country Ci exports, 
namely. 



Pr{(/cp^. =m)} = 



Na 



ni-q) 



Na-r. 



(9) 



D, 



=E^. 



(6) 



In the simulations, all XiS are defined by the real-world 
log(GDP) data that we have collected, the number of 



Second, we derive Pr{ci -^ Pjl^pj = ti}. If the num- 
ber of connections of nodes Ci and pj are given, then the 
situation can be as depicted by FIG 03- The probability 
Pr{ci -^ Pj\kpj = "i} is the number of connection confi- 
gurations satisfying that the number of elements in the 
set of capabilities that are connected with both Ci and pj 
is larger than (1 — s)m over all of the possible connection 



configurations. This number is computed by means of the 
following steps : 

i) There are iV,(iVa - l){Na ~2)---{Na-m + l)(i.e., 
permutation P^ ) ways that the product pj is connected 
to m capabilities. 

ii) All of the m capabilities that are required by pro- 
duct Pj can be divided into two groups based on whether 
they are owned by country c^. Without loss of genera- 
lity, suppose there are n capabilities in the first group 
(that are possessed by Cj, i.e., A3 and AA in FIG 03), 
and m — n capabilities in the second group (i.e., A2 in 
FIGUP). (See FIGHp.) 

iii) There are Pfc"^ = kc^{kc,-l){kc,-2) ■ ■ ■ {kc^-n + 1) 
ways to match the capabilities in the first group. 

iv) Similar to iii), the number of ways that the capabi- 
lities in the second group can be matched with the kc- — n 
capabilities that are not owned by country a is P^~^^. ■ 

a Cj 

v) There are (™) ways to select n elements from m 
capabilities. 

Indeed, because there must be at least [(1 — s)ra] ca- 
pabilities in the first group, so we can obtain : 



Pr{ci ~¥ pj\kp^ = m} = 



E 






P, 



Na-ka 



(10) 



No. 



In the above equation, the summation index n begins 
at rii — T[iSLx([(l — s)m\,kc-+m — Na) because the number 
of capabilities {kc- — n) that are not owned by a cannot 
exceed Na — m and n must be larger than kc- + m ~ Na- 
By inserting Equations [TOl and [9] into Equations [8] and [3 
we can derive the following : 



is the same equation as the relation between capability 
and diversity derived that was in paper |10|], EquationfT^ 
is actually a general definition of Di in terms of Xi in 
which substitutability between capabilities is allowed. 



III. RESULTS 
A. The S-shaped curve 

In the previous sections we introduced our model. 
Here, we will give our simulation and numeric results. 

In FIG[T}3, the blue circles represent the simulation 
results and the red squares represent both the nume- 
ric results of Equation [TH and the logistic fitting. When 
we set the number of capabilities (Na) at 200 [8|, the 
number of products (P) at 720 (which is also the maxi- 
mum product diversity of the countries in our empiri- 
cal data), the link density of capabilities and products 
(q) at 0.048, and the substitutability at 0.6, we obtain 
an "S" curve that resembles the empirical curve of best 
fit for the data recorded in 1995. Furthermore, we use 
the logistic Equation [1] to fit both the empirical and 
theoretical curves and to compare their fitting parame- 
ters. We found that the parameters are similar : whereas 
A = 749.53, A; = 1.932, Xm = 10.625 for the empirical 
curve, A = 737.47, k = 1.938, Xm = 10.753 for the theo- 
retical curve. Therefore, we conclude that our model can 
simulate the empirical S-shaped relationship very well. 



B. Parameter Space 



JVa 



a = pE 



rn— 



Na 



Q^a-g) 



Na—m 



E 






p; 



Na 



(11) 

Notice that kc- ex Xi , which implies that Di is actually 
a function of Xi. 

Although Equation [11] accurately models the relation 
between Xi and Di, it is complex; however, we can sim- 
plify it to an approximate but compact form. If we al- 
low duplicate links to exist in the network, then the per- 
mutations in Equation [TT] can be replaced by exponen- 
tials, and thus, each permutation P^ can be replaced with 
x^ . Furthermore, we can use [(1 — s)m\ to approximate 
max([(l — s)m\, k^ + to — No) ; then, we have 



Na 



m=0 



N„ 



d^^pT. :]rii~,f^ 



E Hi'i^rn-'^r- 



n— [(1 — s)m] 



n rNn. 



A^„ 



(12) 



No. 



When s = 0, Equation [T^ becomes (-^ + 1 — q) 
according to the binomial theorem. Because Equation 12 



Although there are several parameters, the most im- 
portant ones are q and s. In fact, we can fix the other pa- 
rameters (specifically, we select P = 720 and Na = 200) 
and study how q and s affect the shape of the "S" curve. 
From the notions introduced above, the parameter q de- 
termines the capabilities that are required by products. 
Thus, wc can understand q as the average complexity of 
all products. As q increases, countries find it more dif- 
ficult to make products, and as a result, the S-shaped 
curve is steeper and the diversity gap between rich coun- 
tries and poor countries becomes large (see FIGlSJ'V and 
[11,123). 



The parameter s represents the average substitutabi- 
lity degree of the products : one country must possess 
a proportion of 1 — s of the capabilities required by a 
product if this country wants to export that product. 
From FIG13J3, no ceiling for the S-shaped curve can be 
observed when s is small because countries need to have 
locally almost all of the capabilities required by products 
in this case. Thus, when s is zero, the simulation result 
is the same as the result of Hidalgo's model. In contrast, 
a ceiling for the export diversity emerges as s increases 



A R 

How does q influence the shape of "S" curve '-' How does s influence the shape of "S" curve 

SOO r 




Figure 3. The S-shaped relationship, which depends on our model's parameters. (A) These S-shaped curves change if the 
parameter q changes and the other parameters are kept fixed ; when q increases, the S-shaped curve becomes steeper and the 
trade gap among countries increases. (B) These S-shaped curves change if the parameter s changes and the other parameters 
are kept fixed. The ceiling of the S-shaped curve emerges when s increases sufficiently. 



because more capabilities that are required by products 
can be replaced by other available capabilities that are 
owned by producing countries. Hence, the resources (for 
instance, labor, skills, fund and so on) required to pro- 
duce the goods in question are more diverse. 

More numeric experiments are implemented to investi- 
gate how the shape of the "S" curve changes with changes 
in the combinations of q and s. The results show that the 
parameter space (i.e., the combinations of q and s) can be 
decomposed into several regions, as shown in FIGIH The 
blue region in FIG |4] represents the combinations of q and 
s that can generate a curve that models the relationship 
between size and diversity and that exhibits an obvious 
"S" shape. However, the curves in every parameter re- 
gion except for the blue one have the shape of twisted "S" 
curves only partially. We can distinguish these regions by 
considering the third- order derivatives {S^^ Di/dXf) : if 
the third-order-derivative curve S-^^ Di/dXf can be se- 
parated by the x-axis into three segments, then the ori- 
ginal curve is clearly S-shaped. However, if the curve of 
S-^^Di/dXf has only one or two segments that are di- 
vided by the x-axis, then the original curves are not S- 
shaped. 

To quantitatively characterize the curves that model 
the relationship between Xi and Di, we use the logistic 



function (Equation [1} to fit the curves and show how 
the parameters (i.e., k and Xm) change when the combi- 
nations of q and s are varied (FIGE)). However, we only 
show the regions of q and s that will generate a stable "S" 
shape because the logistic fitting would otherwise give 
unreasonable fitting parameters. From FIGlSJ we can ob- 
serve that whereas the slope [k) of Xi-Di is influenced 
mainly by q and not s, the center position of the curve is 
determined mainly by s. 



IV. DISCUSSION 

In general, this paper discusses how the revision of Hi- 
dalgo et al.'s tri-partite network model can generate the 
observed S-shaped curve of the global export diversity, 
which depends on the economic sizes of countries. In this 
model, we found that the substitutability s is an impor- 
tant parameter that can account for the ceiling effect in 
the S-shaped curve. When s decreases, the size - diversity 
curve increasingly resembles a logistic curve and becomes 
dissimilar to the exponential function predicted by paper 
[10|. Therefore, we claimed the substitutability between 
different capabilities cannot be ignored because the em- 



pirical size - diversity curve has an S-shape. 




However, this work is only the first step toward a ful- 
ler understanding of the export diversity in international 
trade. The S-shaped curve that models the relationship 
between diversity and economic size can only show the 
aggregate information regarding one country's export di- 
versity. Additional studies that investigate the distribu- 
tion of different products in a given country are worth 
conducting in future. 
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Figure 4. Different curves and their third-order derivatives in 
parameter space. The different regions of parameters q and s 
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order derivatives. 
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Figure 5. The parameters Xm (A) and k (B) of the logis- 
tic function depend on the parameter q and s in our model. 
Only the parameter regions that can generate "S" curves are 
shown ; the blue areas are blank. 



